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OPTIMAL DISCRIMINATION DESIGNS^ 

By Holger Dette and Stefanie Titoff 

Ruhr-Universitdt Bochum 

We consider the problem of constructing optimal designs for model 
discrimination between competing regression models. Various new 
properties of optimal designs with respect to the popular T-optimality 
criterion are derived, which in many circumstances allow an explicit 
determination of T-optimal designs. It is also demonstrated, that in 
nested linear models the number of support points of T-optimal de- 
signs is usually too small to estimate all parameters in the extended 
model. In many cases T-optimal designs are usually not unique, and 
in this situation we give a characterization of all T-optimal designs. 
Finally, T-optimal designs are compared with optimal discriminat- 
ing designs with respect to alternative criteria by means of a small 
simulation study. 

1. Introduction. Optimal designs are frequently criticized because they 
are constructed from particular model assumptions before the data can be 
collected. Often there exist several plausible models which may be appro- 
priate for a fit to the data. Therefore, in many applications, the data is first 
used to identify an appropriate model from a class of competing models and 
in a second step the same data is analyzed with the identified model. While 
the optimal design problem for the latter task has been considered by numer- 
ous authors (see, e.g., the monographs of Silvey [32], Pazman [26], Atkinson 
and Donev [2] or Pukelsheim [27]), much less attention has been paid to the 
problem of designing experiments for model discrimination. Early work was 
done by Stigler [34] and Studden [38], who determined optimal designs for 
discriminating between two nested univariate polynomials. The correspond- 
ing optimal design is called D^-optimal design and minimizes the volume of 
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the confidence ellipsoid for the parameters corresponding to the extension 
of the smaller model. This criterion directly refers to a likelihood ratio test 
and was discussed by numerous authors (see, e.g., Spruill [33], Dette [10], 
Dette and Haller [12] or Song and Wong [35], among others). Atkinson and 
Fedorov [3, 4] proposed an alternative criterion, which determines a design 
such that the sum of squares for a lack of fit test is large. This optimal- 
ity criterion is meanwhile called T-criterion in the statistical literature and 
has been considered by several authors, mostly in the context of regression 
models (see, e.g., Ucinski and Bogacka [39], Lopez-Fidalgo, Tommasi and 
Trandafir [24] or Waterhouse et al. [40] for some recent references). The Ds- 
and T-optimality criteria have been studied separately without exploring 
the differences between both philosophies of constructing optimal designs 
for model discrimination. 

The present paper makes an attempt to explore some relations between 
the — on a first glance — rather different concepts of constructing discrimi- 
nation designs. In Section 2 we discuss some new properties of T-optimal 
designs and relate the T-optimal design problem to a problem of nonlinear 
approximation theory. In general, T-optimal designs are not unique, and 
in such cases we present an explicit characterization of the class of all T- 
optimal designs. In Section 3 the special case is considered where one of the 
competing models is linear, and here it turns out that T-optimal designs 
are in fact Di -optimal (in the sense of Stigler [34]) in an extended linear 
regression model. This relation is then used to derive several new properties 
of T-optimal designs, especially bounds on the number of support points. 
In particular, it is demonstrated that in many cases the T-criterion yields 
designs which cannot be used to estimate all parameters in the extended 
model. Section 4 gives some more insight into the case of nonlinear regres- 
sion models and also contains an extension of the results to T-optimality- 
type criteria, which are based on the Kullback-Leibler distance and have 
recently been proposed by Lopez-Fidalgo, and Tommasi and Trandafir [24]. 
Finally, in Section 5 several examples are presented to illustrate the theo- 
retical results. In particular, the mean squared error of parameter estimates 
and the power of tests based on T- and T)s-optimal designs are investigated 
by means of a simulation study. 

2. New properties of T-optimal designs. We consider the common non- 
linear regression model 



where 6 gQ C is the vector of unknown parameters, and different obser- 
vations are assumed to be independent. The errors are normally distributed 
with mean and variance cr^. In (2.1) the variable x denotes the explana- 
tory variable, which varies in the design space X (a more general situation 
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with nonnormal, heteroscedastic errors is discussed in Section 4.2). We as- 
sume that 7] is a continuous and real-valued function of both arguments 
{x,6) £ X X @ and a design is defined as a probability measure ^ on X 
with finite support (see Kiefer [21]). If the design ^ has masses Wi at the 
point Xi (i = 1, . . . ,k) and n observations can be made by the experimenter, 
this means that the quantities WiU are rounded to integers, say rii, satisfy- 
ing J2i=i ""^i = ^-iid the experimenter takes rii observations at each loca- 
tion Xi {i = 1, . . . ,k). There are numerous criteria to discriminate between 
competing designs, if parameter estimation in a given model is the main 
objective for the construction of the design (see Silvey [32], Pazman [26] or 
Pukelsheim [27], among others), but much less attention has been paid to the 
problem of developing optimal designs for model discrimination. Early work 
was done by Hunter and Reiner [17], Box and Hill [5] and Stigler [34]. A re- 
view on discrimination designs can be found in Hill [18] . Stigler [34] proposed 
a Ds-criterion for discriminating between two competing (nested) models. 
Roughly speaking, the D^-optimal design yields small variances of the pa- 
rameter estimates in an "extended" model. To be precise, consider the case of 
two rival models for the mean effect in the nonlinear regression model (2.1), 
say 77i(x,6'(i)) and mix, 0(2)) with 9(j) G 9(j) C M™-' {rrij e N, j = 1,2). We 
assume the model ?/i(x, ^(i)) is an extension of the model mix, d{2))- In other 
words, if the last mo = mi — m2 components of the vector 6'(i) = (^(2)5^(0)) 
vanish we obtain the model r/2, that is, 771(2;, (0^^, 0"^)'^) = r]2{x,0{2)), where 
denotes the (mi — m2)-dimensional vector with all components identical 
0. The 1)^1-012 "Optiniality criterion is defined by the expression 



respectively. A Dmi-m2 -optimal design maximizes the function $D™,-^_„2 
in the class of all designs, satisfying Range(ir) C Range (M^i (C))) where 
the matrix K is defined by = (O,/^!-^^) G ]R('"i-'"2)x™i,/„^_„^ g 
]^(mi-m2)x(mi-m2) identity matrix and denotes the (mi — m2) x m2 

matrix with all entries identical 0. The criterion is motivated by the likeli- 
hood ratio test for the hypothesis 






(2.3) 



i/o:i^%)=0. 
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Because the volume of the confidence ehipsoid for the parameter K'^O^i-^ 
is minimized if the function $D„i-^_„2(0 is maximized with respect to ^ 
(see Pukelsheim [27]), we expect that a Dmi-m2 -optimal design yields good 
power for the test of the hypothesis (2.3). The T-optimality criterion was 
introduced by Atkinson and Fedorov [3, 4], as a criterion which directly 
reflects the goal of model discrimination in the design of experiment and 
has found considerable interest in the recent literature (see, e.g., Ucinski and 
Bogacka [39], Lopez-Fidalgo, Tommasi and Trandafir [24] or Waterhouse et 
al. [40], among many others). It does not necessarily refer to nested models 
and assumes that one model, say rj = rii is fixed. The T-optimality criterion 
determines the design such that the expression 

(2.4) N0= inf f {v{x)-V2{x,d(^2))fd^{x) 

6(2)66(2) Ja" 

is maximal. The statistical interpretation of the T-optimality criterion is 
as follows. Assume that we are interested in the problem of testing the 
hypothesis Hq :r] = rji versus Hi '.rj = ri2, which corresponds in the context 
of nested models to the hypotheses 

(2.5) /7o:%) = (^[J^) versus i/i : 0(1, / ^fj) ) . 

Under local alternatives of the form 6(^\„ = it follows that the 

noncentrality parameter of the corresponding likelihood ratio test up to the 
factor 0"^ is given by 

where Mii,2(^) denotes the Schur complement of the matrix Mm2{i) in 
-^mi {i) and a straightforward calculation shows that 

<^2 = A(0 + o(l), 

where the function rj in (2.4) is given by r/(-) = ryi(-, {0^2)^ ^Jo))^)- Thus a T- 
optimal design maximizes the power of the likelihood ratio test with respect 
to local alternatives. 

The T^-distance in (2.4) corresponds to the assumption of a normal dis- 
tributed, homoscedastic error and alternative metrics could be used reflect- 
ing different assumptions regarding the error distribution and variance struc- 
ture. For example, recently Lopez-Fidalgo, Tommasi and Trandafir [24] pro- 
posed a Kullback-Leibler distance, which corresponds to the likelihood ratio 
test for the hypothesis Hq :r]i=rj2 versus Hi : 771 7^ r/2 under different distri- 
butional assumptions. In the present paper we will restrict ourselves to the 
criteria (2.2) and (2.4), but mention possible extensions of our results in the 
second part of Section 4. 
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Note that the T-optimahty criterion, and in the case of nonhnear re- 
gression models also the D^-optimality criterion, depends on the unknown 
parameter 0(i), which may be difficult to choose in concrete applications. 
However, a robust version of the two optimality criteria can easily be ob- 
tained applying a sequential, Bayesian or (standardized) maximin approach 
(see, e.g., Atkinson and Fedorov [3], Miiller and Pazman [25], Dette and 
Neugebauer [13, 14] or Dette [11], among many others). 

For the following discussion consider the kernel 

(2.6) A(0(2),O= / ir]ix)-V2ix,0i2))fdax) 

and define for a continuous (real-valued) function / on the design space 
X its sup-norm by ||/||oo = ^'^Vx^x 1/(^)1- Throughout this paper it is as- 
sumed that the infimum in (2.6) is attained for some 0^2) ^ ©(2) ^'^d that 
a T-optimal design exists. Moreover, we assume that the regression func- 
tions ?7i and T72 are differentiable with respect to the second argument. Our 
first result characterizes a T-optimal design as the solution of a nonlinear 
approximation problem. 

Theorem 2.1. 

supA(<c) = sup inf A(0(2),O = ^ inf II?/ - ^2(-,^(2))llL- 

g 5 f(2)fcfc'(2) f(2)fcfe'(2) 

Moreover, if ^* denotes a T-optimal design and 0*2) '^^2/ value corre- 
sponding to the minimum 0/ A(0(2)5?*) ^^^^^ respect to 0(2) G ©(2); then O*^-^ 
corresponds to a best uniform approximation of rj by the functions 'n{-,0(2)), 
that is, 

inf \\v-m{-,0{2))\\oc = \\v-m{-,0l2))\\oo^ 

A{n = h-V2{;0l2))\\l and 

(2.7) supp(r ) ^A:={xGX\ \r^{x) - mix, O^l = \\v - ^2(-, ^^2)) llool- 

Proof. A straightforward calculation shows that 

supA(0=sup inf A(0(2),e) 
s, ^ t'(2)Ge(2) 

< inf sup|r/(3;) -??2(x,6'(2))|^= inf h - r/2(-, 6'(2))|lL- 

^(2)6^(2) xGA' f{2)fcf(2) 

On the other hand, O*^-^ minimizes the function defined by (2.6) with 

in the set 0(2) and therefore we obtain from the equivalence theorem for 

T-optimality (see, e.g., Atkinson and Fedorov [3]) 

inf ||r?-r?2(-,0(2))|lL<h-^2(-,%)|lL = A(r)=supA(O, 

^(2)e©(2) 5 
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which proves the first assertion of the theorem. For a proof of the second 
part assume that the design ^* is a T-optimal design and that 9*2^ minimizes 
the function A(^(2)i '^*)) then the function \r]{x) — ri2{x, ^(2)) I attains its max- 
imum at any support point of ^* (see Atkinson and Fedorov [3]) and 9*^^ 
corresponds to a best uniform approximation of the function r/ by functions 
of the form ?/2("5^(2))- Therefore, the assertion follows. □ 

Theorem 2.1 links the T-optimal design problem to a problem in nonlinear 
approximation theory, which will be further discussed in Sections 3 and 4. 
Note that the theorem provides a saddle point property of the point (^^2) ) ) 
although the kernel A{9^2)yO is in general not convex as a function of 0(2)- 
The result is particularly useful, if the best uniform approximation of the 
function ij by functions of the form ?72(') ^(2)) is unique, say f?2(') ^(2))- this 
case, the set A in (2.7) is independent of the design ^* and the following 
result allows us to characterize all T-optimal designs. 



Theorem 2.2. Assume that the parameter 9^2) corresponding to the best 
uniform approximation of the function rj by functions of the form %(•, ^(2)) 
is unique and an interior point of the set 6(2)- 

(a) // a design ^* is T-optimal, then 



(2.8) Jjvi^)-V2{x,9^2)))-g^m{^,0i2)) 



dC{x) = 0. 



(b) Conversely, assume that a design ^* satisfies {2.8), supp(^*) C A and 
that the minimum of the function 

(2.9) 0(2) Ijvix) - mix, 9^2))f dC (x) 

is attained at a unique point in the interior 0/6(2); then the design ^* is 
T-optimal. 



Proof. For a proof of part (a) we note that by Theorem 2.1 we have 
0(2) = 0(2), supp(^*) C A for any T-optimal design Consequently, we 
obtain 

A(r) = ^ inf / {v{x)-r,2{x,9(^2))fde{x) 

and the assertion follows because 0(2) = 0(2) corresponds to the (unique) 
minimum of the function on the right-hand side. 
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For a proof of part (b) assume that supp(^*) C A, then it follows from 
Theorem 2.1 

sup A(0 = h - %(-,^(2))|lL = ^('?(^) - V2ix,9^2))f dCix) 
= inf / {r^{x)-r^2{x,d(^2))?de{x) 

because the parameter 9(^2) corresponds to the unique minimum of the func- 
tion (2.9). □ 

Roughly speaking Theorem 2.2 provides a characterization of all T-optimal 
designs by a system of linear equations, if the parameter 9(^2) corresponding 
to the best approximation is unique, an interior point of the set 0(2) and if 
the cardinality of the set A defined in (2.7) is finite. If ^(2) is a boundary 
point of G(2) an extension of condition (2.8) can easily be derived using 
Lagrangian multipliers. 

In many applications the best uniform approximation of the function 77 
by functions of the form f?2(") ^(2)) is in fact unique, and sufficient conditions 
for this property can be found in the books of Rice [30] or Braess [7] . Note, 
there is an additional assumption in part (b) of Theorem 2.2 concerning 
the minimum of the function defined in (2.9). The answer to the question 
if this assumption is satisfied depends on the function rj and the parameter 
set 0(2) C W^^ . For example, in the linear case, that is r]2{x, 9{2)) = ^^)/(^) 
[for an appropriate vector of regression functions /(x)], this assumption is 
always satisfied, because the Hesse-matrix of /\{9(2),C) with respect to the 
parameter 9^2) is given by 

^A(0(2),O = 2- j^f{x)f{x)di{x\ 

and therefore positive definite, if the design ^ has more than m2 support 
points. 

An exchange type algorithm for the computation of T-optimal designs was 
proposed by Atkinson and Fedorov [3]. Theorem 2.2 suggests an alternative 
method to determine T-optimal designs. In a first step the best uniform 
approximation of the function rj by functions of the form ^?2(") ^(2)) is deter- 
mined. For this calculation the Remes exchange algorithm could be used in 
many cases, which is a common tool in approximation theory (see Rice [30], 
Vol. 1, pages 171-180). The algorithm also yields the set of all possible sup- 
port points A defined in (2.7) of T-optimal designs and will be illustrated in 
the following example. Secondly, the system of equations in (2.8) is solved 
to characterize all T-optimal designs. In contrast to the method proposed 
by Atkinson and Fedorov [3], this approach yields all T-optimal designs. 



8 H. DETTE AND S. TITOFF 

Example 2.3. Consider the T-optimal design problem on the interval 
[—1,1] for the functions 

(2.10) 7]{x) =rji{x,9(^i)) = 1 + X + and r]2{x,0(2)) = S(2)i + S(^2)2X- 

It can be shown that the best approximation of the cubic polynomial rj 
by linear functions r]2 alternates at most 4 times. The Remes algorithm 
starts with an initial guess for the best approximation of rj, say r]2{-,0^^). 

Given an approximation 7?2(")^(2)) in the kth step one determines 4 points 

< . . . < xf^^^ G [-1, 1] such that 

(2.11) (,(xf +^)) - ,,(xf ^g))(,(4tt^^) - mix%\'\ol^])) < 

j = 1,2,3 [which means that the difference r]{x) — r]2{x,9^^2)) opposite 
sign at the adjacent points x^-^"*"^^], 

(2.12) mlx|r?(xf+^)) -r?2(xf+^\e[2'})| = h - r?2(-, ^g) IL 

[at one of the points Xj'^^^ the function r] — ?72(-,^(2)) attains its sup-norm] 
and 

(2.13) mjn|r?(rEf +^)) - V2{xf+'\ e^^])\ > rnix|r?(xf ) - r/2 (rrf , ) | . 
In the next step the parameter 0^(^2)'^^ determined such that 



inax|r/(x} ')-m{Xj ,^^2) 



is minimal [in other words, the best approximation of the function r] by 

?72(-,0(2)) with respect to the sup-norm on the set {xi'~^^\ . . . ^x^J'^^^} is 
determined]. It can be shown that it is always possible to choose the points 
{x^~^^\ . . . ,^4'^^^^} such that (2.13) is satisfied (see Rice [30] and note that 
it is easy to satisfy (2.11) and (2.12)). We have illustrated the performance 
of the algorithm for the models in (2.10) in Table 1 and Figure 1, where 

we show the parameter O^^) — (^[2)1' ^{2)2)' {x^i \ ■ ■ ■ ,x'^^^} and the 

approximations r/ — ry2(") ^[2) )• Note that the algorithm stops after a few 
iterations which is rather typical for many examples. The algorithm yields 
that the best approximation is given by 

ri{x) -r]2(.x,9l2)) =x^ - fx, 

which yields A = { — 1, —■^j ^, 1} for the set defined in (2.7). Because all as- 
sumptions of Theorem 2.2 are satisfied (note that the regression model 772 
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Table 1 

The iterations of the Remes algorithm for the calculation of the best approximation of the 
function 1 + x + x"^ by linear polynomials 6(2)1 + 6(2)2^ 



k 


"(2)1 


/j(fe) 

"(2)2 


(fe) 


(fe) 


™(fc) 
•^3 


(fe) 
x\ 





0.994 


1.075 


-0.9 


-0.2 


0.2 


0.8 


1 


1.0000 


1.8705 


-1.000 


-0.153 


0.153 


1.000 


2 


1.0000 


1.7514 


-1.000 


-0.538 


0.538 


1.000 


3 


1.0000 


1.7500 


-1.000 


-0.500 


0.500 


1.000 



is linear), the system of equations (2.8) characterizes all T-optimal designs. 
A straightforward calculation shows that the set of all T-optimal designs is 
given by the one-parametric class 

(2.14) c=f ri "^2^ iM, 

where p £ The parameter p could be chosen such that a further opti- 

mality criterion (e.g., D-optimality for the cubic model) is maximized in the 
class of all T-optimal designs. We finally note that the exchange type algo- 
rithm proposed by Atkinson and Fedorov [3, 4] only yields the three-point 
design ^]^^g as T-optimal design with a singular information matrix in the 
cubic regression model. 



Remark 2.4. It is worthwhile to mention that Theorems 2.1 and 2.2 do 
not require the assumption of nested models. This assumption is only needed 
for the statistical interpretation of the T- and Ds-optimality criterion. 

3. D\- and T-optimal designs in linear regression models. In this sec- 
tion we restrict ourselves to the case, where the regression model r\2 is a 
linear model, that is, 

(3.1) r?2(x,e(2))=0f2)/(^), 

with Q{2) £ 0(2) = IR™^- Note that the model r] = iji is not necessarily linear 
(this case will be discussed later in this section). Moreover, the two models 
are not necessarily nested, except if it is stated explicitly in the following 
discussion. It turns out that in this case the T-optimal design is in fact also 
Di -optimal in the sense of Stigler [34] for the regression model 

(3.2) y = ej^^f{x)+prj{x) + e. 

For a proof of this property let f{x) = {f'^{x),r]{x))'^ G M"^2+i denote the 
vector of regression functions in the linear regression model (3.2), let 6^2+1 = 
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-0.2- 




FlG. 1. Different iteration steps of the function 1 + x + — ^[2)1 ~ ^(2)2^ generated by 
the Remes algorithm. Left panel k = Q, middle panel k = l, right panel k = 3. 



(0, . . . , 0, 1)'^ G bg + l)th unit vector and define 



(3.3) 
(3.4) 



M{0= / f{x)f{x)dax), 
Jx 

Jx 



as the information matrices in tlie regression model % and tlie extended 
model (3.2), respectively. Recall that a Di-optimal design in the regression 
model (3.2) satisfies 6^2+1 G Range(M(^)) and maximizes the expression 



(e^,+iM-(e) 



.1 _ detM(e) 
~ detM(^) 



(see, e.g., Stigler [34] or Studden [38]). The L'l-optimality criterion is a 
special case of the c-optimality criterion, which determines for a given vector 
c e R™2+i the design ^ such that the expression (c^M {(,)c) ^ is maximal 
and the condition cG Range(M(^)) is satisfied (see Pukelsheim [27]). Note 
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also that the expression M (^)c is approximately proportional to the 
variance of the least squares estimate of (0^^,/3)c in the regression model 
(3.2) (see Pukelsheim [27]). Therefore, a Di-optimal design minimizes the 
variance of the least squares estimate of the coefficient (3 in the extended 
regression model (3.2). 

Theorem 3.1. Assume that {3.1) is satisfied, then a design ^* is T- 
optimal if and only if it is Di-optimal in the extended regression model 
(3.2). 

Proof. Let /(x) = (/(2)i(a;), . . . ,/(2)m2(^))"^ denote the vector of func- 
tions corresponding to the first part in the linear model (3.2) and define for 
continuous functions (71, ... (A; G N) with domain X the Gram determi- 
nant by 

G{gi,...,gk) := 



gi{x)gj{x)dC{x 

X J i j=l 

Then a standard result from Hilbert space theory (see Achiezer [1], page 16) 
shows that 

^ g(^J(2)l J(2)2,--- J(2)m2) ^ detM(g) 

G(/(2)i,/(2)2,...,/(2)mJ detM(0' 
which proves the assertion. □ 

In the case where the model ??i(-,^(i)) is also linear, an alternative repre- 
sentation for the criterion A(,^) was given in Section 4.2 of Atkinson and Fe- 
dorov [3]. Theorem 3.1 provides a different interpretation of the T-optimality 
criterion and does not require the assumption of a linear model ?7i(-, 0(i)). In 
the following we derive several important conclusions from Theorem 3.1. We 
begin with a general result on the number of support points of T-optimal 
designs, which is a direct consequence of Corollary 8.3 in Pukelsheim [27]. 
Roughly speaking the number of support points of the T-optimal design is 
at most 7X12 + 1) independently of the dimension m\ of the parameter 
corresponding to the model 771 (-,0(1)). 

Corollary 3.2. Assume that {3.1) is satisfied, then there exists a T- 
optimal design ^* with m2 + 1 support points. 

We now present a refinement of this result in the case, where the design 
space is an interval, say / C M and the regression functions in model (3.2) 
form a Chebyshev system (see Karlin and Studden [20]). In many cases 
(with a minor additional assumption) the T-optimal design is supported at 
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precisely m2 + 1 well defined points, which correspond to the system under 
consideration and can be found explicitly. To be precise recall that a set 
of k functions /ii , . . . , /i^ : I — > M is called a weak Chebyshev system (on the 
interval /) if there exists aneG{— 1,1} such that the inequality 



(3.5) 



/ii(xi) 



hk{xi) 



hi{xk) 



hkixk) 



> 



holds for all xi, . . . , Xfc G / with xi < X2 < • • • < x^. If the inequality in (3.5) 
is strict, then {hi, . . . , /i^} is called a Chebyshev system. It is well known (see 
Karlin and Studden [20], Theorem II 10.2) that if {hi, . . . , h^} is a Chebyshev 
system, then there exists a unique function, say J2i=i'^i^i{^) =c*^h{x), 
{h = {hi, . . . ,hk)^) with the following properties 

(i) \c*^h{x)\<l VxG/ 

(3.6) (ii) there exist k points < • • • < x^, such that 

c*^h{x*) = {-iy, i = l,...,k. 

The function c*^h{x) is called Chebyshev polynomial, and we say that it is 



alternating at the points x\. 



The points called Cheby- 



shev points and need not to be unique. They are unique in most applications, 
in particular if 1 e span{/ii, . . . , h^}, k>l and / is a bounded and closed 
interval, where in this case x\ = min^jg/x, x^. = max^jg/x. It is well known 
(see Studden [36], Pukelsheim and Studden [28] or Imhof and Studden [19], 
among others) that in many cases c-optimal designs in regression models 
are supported at the Chebyshev points. The following result shows that a 
similar statement can be made for T-optimal designs. 



Theorem 3.3. Assume that {3.1) is satisfied, that the design space is 
an interval, say X = I cM. and that {fi, . . . , fm2} o, Chebyshev system on 
the interval I. In this case the set A defined in {2.7} has at least m2 + 1 
points. 

Moreover, assume that additionally {/i, . . . , /^a, ??} is also a Chebyshev 
system on the interval I and 



fi{xi) 

fm2 (^l) 

r?(xi) 



fi{x 



m2j 







fm2 {Xm2 ) 

r/(x. 



m2, 



1 



for all xi,. . . , Xm2 £ I satisfying xi < • • • < Xm2 • -^e* x^ < • • • < x. 



rrt2+l 



denote 



m2 + 1 Chebyshev points satisfying {3.6) and define ^* as the design which 
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has weights 



Ui 



at the points x* {i = 1, . . . , m2 + 1), where u = (ui, . . . , Um2+i)'^ = i^'^^) ^ ^ 
em.2+ij O'lT'd the matrix X is defined by 

/ fi{xl) ... fi{xl,^+i) 
X= : •.. : 

\fm2+l{^l) ■■■ fm2+l{Xm2+l) ' 

(here we put /mj+i — ''l)- Then ^* is a T -optimal design. 

Proof. It follows from Theorem 1.1 in Chapter IX of Karlin and Stud- 
den [20] that the best uniform approximation of the function rj by functions 
of the form r]2{x,d(2)) = is unique. By Theorem 2.1 the support of 

a T-optimal design is contained in the set 



A 



x£l 



m.2 



"12 



where the parameters ^(2)ii • • • i ^(2)m2 correspond to the best uniform ap- 
proximation of rj by linear combinations of /i , . . . , fm2 ■ Theorem 1 . 1 in Kar- 
lin and Studden [20] also shows that the cardinality of the set A is at least 
m2 + 1 and the first assertion follows. 

For a proof of the second part we note that by Theorem 3.1 the T-optimal 
design problem is equivalent to the Di -optimal design problem in the ex- 
tended regression model (3.2). Because this is exactly the em2+i-optimal 
design problem it follows from Kiefer and Wolfowitz [22] (see also Studden 
[36]) that the T-optimal design is supported at m2 + 1 points satisfying (3.6). 
The formula for the corresponding weights is now a direct consequence of 
Corollary 8.9 in Pukelsheim [27]. □ 

If, under the assumptions of Theorem 3.3 there exist exactly m2 + 1 
uniquely determined Chebyshev points, then any T-optimal design is sup- 
ported at precisely m2 + 1 points. This situation is rather typical in ap- 
plications. Note that in Example 2.3 (m2 = 2) the functions {l,x} form a 
Chebyshev system. Thus the first part of Theorem 3.3 implies that the set 
A in (2.7) has at least cardinality 3 (in fact its cardinality is 4). On the 
other hand, the system {l,x,x^} is not a Chebyshev system on the interval 
[—1,1], because the polynomial x^ — jx has 3 roots in the interval [—1,1]. 
As a consequence the second part of Theorem 3.3 is not applicable here. 
In fact, there exist an infinite number of T-optimal designs with 4 support 
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points indicating that the Chebyshev property of the system {/i, . . . , /m2) ^} 
is really necessary in this context. 

In the following we specialize the result of Theorem 3.1 to the case, where 
the model r]i is in fact an extension of the linear regression model (3.1), that 

(3.7) r?(x) = r?i(x,0(i)) = el^^f{x) + e^^gix), 

where g{x) = {gi{x), . . . ,gmo{x))'^ is a further vector of regression functions 
and rriQ + m2 = mi. In this case. Theorem 3.1 can be slightly simplified. 

Corollary 3.4. Assume that {3.1) and (5.7) are satisfied, then a de- 
sign ^* is T -optimal if and only if it is Di-optimal in the extended regression 
model 



(3.8) !/ = 9(2,/W + Wi')+' 

To)* 



where (t){x) =6TQ^g{x) 



Proof. From Theorem 3.1 and its proof it follows that a design is T- 
optimal if and only if it maximizes 



detM(0 _ g(^J(2)l./(2)2.--- J(2) 



m2 I 



detM(e) G(/(2)l,/(2)2,.-.,/(2)mJ 

(3.9) 

G'(^(0)5'>/{2)l>/(2)2i • • -1/(2) mi) 
G'(/(2)l,/(2)2i---)/{2) ■m2 I 

where the matrix M(^) is defined by (3.4). The last equality follows from 
(3.7) and the multi- linearity of the Gram determinant. Therefore the T- 
optimal design is Di-optimal in the regression (3.8). □ 

We conclude this section with an alternative interpretation of the T- 
optimality criterion as a compound criterion in the situation considered in 
Corollary 3.4. To be precise, we define the mo = vn\ — regression models 

V = ^J2)f(.x) + f^j9j{x) +e, j = l,...,mo. 

Then, by Theorem 3.1, the T-optimal design for discriminating between r]2 
and the jth model {Oj^y Pj)fj{x) with fj{x) = {f'^{x),gj{x))'^ maximizes 

_ detM,(0 _ g(gj,/(2)i,---,/(2)mJ . _ . 

detM(^) G'(/(2)i,...,/(2)m2) 

where 

I fj{x)fJ{x)dC{x) 
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and the matrix M(^) is defined in (3.3). The proof of the next result is now 
a direct consequence of the representation (3.9) and the multihnearity of the 
Gram determinant. 

Corollary 3.5. A T -optimal design for discriminating between the 
models {3.1) and (5.7) maximizes the weighted average 

mo 

A(e) = E^(o)iA,(e), 

where 0[o)j denotes the jth component of the vector O^q^ in (5.7). 

Note that by Corohary 3.5 the T-optimal design for discriminating be- 
tween the models (3.1) and (3.7) can be interpretated as a compound opti- 
mality criterion in the sense of Lauter [23] and therefore results for calculat- 
ing optimal designs with respect to compound criteria can be used to find 
T-optimal designs (see, e.g., Pukelsheim [27], Cook and Wong [9] or Clyde 
and Chaloner [8], among many others). 

4. Further discussion. 

4.1. Some comments on nonlinear models. As mentioned before, in gen- 
eral Theorems 2.1 and 2.2 link the T-optimal design problem to a problem 
in nonlinear approximation theory, which has a long history in mathematics 
(see Braess [7] or Rice [30]), and is substantially more difficult to analyze 
compared to the linear case considered in Section 3. We will now indicate 
how this theory can be used to transfer some of the results of Section 3 to 
the nonlinear case. For this we assume that the design space X is an interval 
and that the function r/2 is continuous on x G(2)- The following definition 
is taken from Rice [30]. 

Definition 4.1. The class of functions M = {??2(-,6'(2))l ^(2) £ ©(2)} 
has property Z of degree m = m{9*^2)) point 0^2) £ ®(2)j for any 

6{2) £ 0(2) ^(2) / ^*2) difference r]2{x,9*2-^) — 'i]2{x,6(^2)) ^^-s most 
m — 1 zeros in X . 

The class of functions {??2("5 ^(2))l^{2) ^ ®{2)} called locally solvent of 
degree m = m{9*^2)) '^^ point O*^^) ^ ©{2); given a set {xi, . . . , Xm} C X 
and e > 0, there exists a number 5 = (5(0^2) ) ^) ^i' • • • i^m) > such that the 
inequalities 

\Yi-T]2{xi,ei2))\<5 (i = l,...,m) 

imply the existence of a solution 9^2) G ©(2) of the system of nonlinear equa- 
tions 



'n2{xi,0{2)) = Yi, i = l,2,...,m 
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which satisfies 

h2(-,%)-r/2(-,^(2))lloo<e- 

The class A4 is called varisolvent if at each point the local solvency property 
and property Z are satisfied with the same degree. 

Examples of varisolvent families include sums of exponentials and rational 
functions (see Rice [30]). If the class of functions {r]2{-,0{2))\ ^(2) G ©(2)} 
is varisolvent, the following result gives a rough estimate of the number of 
support points of the T-optimal design. The proof can be found in Braess [7]. 

Theorem 4.2. Assume that the class of functions M = {??2(', ^(2))l^(2) ^ 
0(2)} ^-5 varisolvent and that rj is a continuous function on X such that r] — 
r/2(-,^(2)) is not constant. The function %(-,^(2)) is a best approximation of 
the function r] if and only if the difference rj — rj2{- .,6(2)) alternates m{9^2)) + ^ 
times, that is, there exists at least m{9(2)) + 1 points Xg < • • • < ^^(9(2)) 
X such that 

r,{x*) - mix*, 0^2)) = e{-iyh - ^(2))lloo. i = 0, . . . , m(^(2)), 
where e £ { — 1,1} . 

Theorem 4.2 gives some hint of the number of support points of the T- 
optimal design. By this result, there exists a best approximation of rj by 
functions of the form r]2{-,0^2)) (^(2) S 0(2)), such that r* = rj — r}2{- ,0 (2)) al- 
ternates at least rn{9{2)) + 1 times. In many cases there are no other points in 
X where the difference r* attains its maximum, and it follows from Theorem 
2.1 that the T-optimal design has at most rn{6(2)) + 1 support points. We 
illustrate this heuristic argument by an example, where we consider sums of 
exponentials. 

Example 4.3. Assume that 

where x £ X <z[Q, 00), 9(^2)2k-i S M, 6'(2)2fc ^ {k = 1, . . . ,m2) and the de- 
sign space is a compact interval. Models of this type have numerous appli- 
cations in pharmacokinetics (see, e.g., Shargel and Yu [31] or Rowland [29]). 
It follows from Braess [7], pages 190-191, that for each 

I 

n(x) = ^Oje-''^^ 
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with 61, ... ,6; 7^ 0, the class of functions :r= {r/2(-, 6'(2)) I ^'(2) G K^"'^6'(2)2i ^ 
M"*"; j = 1, . . . , 772-2} is locahy solvent at u of order m2 + / . Similarly, the class 

has property Z of degree m2 + / at and therefore it is varisolvent at 
u of degree 1712 + 1- li r] = rji is a continuous function and r]2{-,0^2)) is the 
best approximation of rj, it follows from Theorem 4.2 that the difference 77 — 
?72(-,^(2)) alternates (at least) m(0(2)) + 1 = 7712 + /(^(2)) + 1, where /(^(2)) de- 
notes the number of non-vanishing coefficients among 6(^2)1 1 ^(2)3) • • • > ^{2)2m2-i 
in 772(3;, ^(2))- By Theorem 2.1 the support points of a T-optimal design must 
be among the points, where the function 77 — r]2{',^{2)) attains its maximum. 
If none of the coefficients ^2{2j-i) vanishes, the cardinality of the set A in 
(2.7) is at least 2?7i2 + 1- 

The upper bound on the cardinality of the set A depends on the particular 
properties of the function r] = rji and is in many cases close to the lower 
bound 2?7i2 + 1- For example, if 771 is also a sum of exponentials, say 

mi 

7?i(x,e(i)) = 5]e(i)2,_i e-W, 
i=i 

^(i)2j-i £ ^(i)2j ^ where mi = 1712 + tuq > m2, the difference r* = 771 — 
r/2(',^{2)) consists of at most nii + 777-2 different exponential terms. Because 
of the Chebyshev property of the function {e"j^|j = 1, . . . , Z} on the nonneg- 
ative line (0,oo) (see Karlin and Studden [20]) it follows that the derivative 
of the difference r* (which is also a sum of at most mi + r?72 exponential 
terms) has at most 7771 + 7772 — 1 roots. Observing that lim^^^oo r*{x) = it 
therefore follows that there exist at most 777-1 + ?^2 alternating points of the 
difference r* . Moreover, if the cardinality of the set A is exactly r?7i +777-2, 
then a boundary point of the design space X is an element of the set A. 
Consequently any T-optimal design has at most 7771 + 7772 support points. 
Note that the number of parameters in the exponential models 771 and r/2 is 
27n-i and 27n-2, respectively. Because r?72 < r?7i the T-optimal design cannot 
be used to estimate all parameters in the extended model iji. For example, 
if 7771 = + 1, it follows from these arguments that a T-optimal design 
has precisely 2777-2 + 1 support points, although the model 771 has 2777-2 + 2 
parameters. 

4.2. T-optimality based on the KuHback-Leibler distance. Recently Lopez- 
Fidalgo, Tommasi and Trandafir [24] considered a generalization of the 
T-optimality criterion, which is based on the popular Kullback-Leibler (KL)- 
distance. The general criterion addresses the problem of a nonnormal er- 
ror distribution and heteroscedasticity in model (2.1). It reduces to the T- 
criterion in the case of normal and homoscedastic data. We briefly indicate 
that the results of the previous sections can be easily extended to this more 
general class of optimality criteria. 
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Following Lopez-Fidalgo, Tommasi and Trandafir [24] we specify the two 
different models by their densities, say x, 0^), cr^); G ~ -'-'2' 

where cr^ is a nuisance parameter corresponding to the variances of the 
responses. We fix one model, say f{y,x,a'^) = /i(?/,x,^(i),o"^), and consider 
for a design ^ the optimality criterion 

(4.1) Akl(6=, inf / dKL{fj2,x,e^2))dC{x), 
where (for any x G X) 

c^kl(/,/2,x,6I(2)) = / f{y, X, (j^) log( ^''^ 

denotes the KL-distance between the "true" model / and the alternative 
model /2(y, a;, ^(2)> cr^)- A KL-optimal design maximizes Akl(0 in the 
class of all designs. The goal of this criterion is to determine designs maxi- 
mizing the power of the likelihood ratio test for the hypotheses 

Ho- f{x,y,a^) = f2{x,y,9(^2),(^'^) vs. i^i : /(y,x,cj^) = /i(y,x,6'(i),cr^) 

for the "worst" choice 0(2) ^ ®(2)- Similar arguments as given in the proof 
of Theorem 2.1 show that 

SUpAkl(6=, inf MKL(/,/2,-,e(2))lloo = IMKL(/,/2,-,^^2))lloo = 

where 0^2) corresponds to the minimum in (4.1) for the design ^-nd the 
support of a KL-optimal design satisfies 

SUpp(ek) C ^KL = {xe X\dKL{f, f2, X, 0^2)) = IMkl(/, /2, ^(2)) llool- 

This means that the KL-optimal design problem is closely related to the 
problem of determining the best uniform approximation of the function ry = 
by the (nonlinear) parametric family 

(4.2) {^^KL(/,/2,-,^(2))l^(2)Ge(2)}. 

Therefore, all results of the previous sections remain valid, where the class 
{^2(')^(2)) I ^(2) £ ®(2)} has to be replaced by the set defined in (4.2) and 
the function rj = rji is given by r](x) = 0. We will illustrate these ideas with 
an example for heteroscedastic regression models with normal distributed 
responses. 

Example 4.4. We consider the problem of discriminating between two 
regression models with heteroscedastic but normally distributed errors, that 
is, 

P/l" ~ AfiVjix, 0(,)), (1 - x')-'), j = 1, 2, 
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where ?7i(x,0(i)) =rj{x) = is a cubic, ^(a;, 0(2)) = ^(2)1 +^(2)22^ a linear 
polynomial and the explanatory variable satisfies x G (— 1, 1). D-optimal de- 
signs for polynomial regression models with variance function (1 — x'^)~^ 
have been studied extensively in the literature (see, e.g., Fedorov [16]), but 
discrimination designs have not been considered so far. If /j(y,x,0(j)) de- 

Y\x 

notes the density of Pj with respect to the Lebesgue measure it follows 
by a straightforward but tedious calculation that 

(4.3) dKL(/,/2,X,e(2)) = (1 - X^){^x' - 0(2)2X - 0(2)l)', 

and the best uniform approximation of the function r/ = by functions of 
the form (4.3) is unique and given by c?kl(/i /21 a;, (9(2)) = {^x^ — 4a;)^(l — x"^) 
with corresponding set 



and ||(iKL(/)/25 3;,^(2))||oo = 1- The analogue of Theorem 2.2 shows that all 
KL-optimal designs are supported in ^kl and characterized by the analogue 
of (2.8), which yields 



/ T^dKL(/,/2,X,0?2))dr(^) = -2 / (l-x2)(8x3-4x)f Mdr(x) 

0. 



A straightforward calculation shows that all KL-optimal designs are given 
by the one-parametric class 



( -J2 + V2 -J2-V2 \/2-V2 J2 + V2\ 



2 2 2 2 

(2 - 72) + 4p( ^2 - 1) V2-4p{V2-l) 1 

V P 4 4 2-P J 

where p S [0, ^] . We finally note that the algorithm proposed by Lopez- 
Fidalgo, Tommasi and Trandafir [24] yields to the 3-point design obtained 
for p = 1/2, which cannot be used for estimating the parameters in the cubic 
model. 



5. Examples. In this section we compare T- and Dg-optimal designs 
with respect to their power properties and estimation error by means of 
a simulation study. We begin with the case of discriminating between two 
polynomials of degree 1712 — 1 and mi — 1 = mo + ^2 — 1 on a nonnegative 
interval. Our second example considers a nonlinear case, namely exponential 
regression models. 
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5.1. Polynomial regression. Consider the polynomial regression models 

mix, 0(2)) = 0(2)1 + G{2)2X + ■■■ + 0(2)m2x"'^~\ 

mix, 0(1)) = 0(1)1 + 0(i)2X + • • • + 0(i)^,X-^-l + • • • + 0(1)^0+^2^"*''+"'^"', 

where the explanatory variable x varies in a nonnegative interval, say / C 
[0, oo). Note that under the additional assumption of positive coefficients 
0{i)i+m2 > ■ ■ ■ ' 0{i)mo+m2 ^wo systems of functions 

{l,x,..., 0(1)1 + 0(i)2X + • • • + 0(i)^o+^,x-«+-^-i} 

(5.1) 

{1,:e,-.-,x'"^-1} 

form a Chebyshev system on the interval / and that the number of corre- 
sponding Chebyshev points is exactly m2 + 1 (see Karlin and Studden [20], 
page 9). Consequently, Theorem 3.3 is applicable here and any T-optimal 
design is supported at m2 + 1 points. We note that in the case ttt-q > 1 the 
T-optimal design cannot be used for the F-test, which is commonly applied 
to discriminate between the two nested polynomials and requires at least 
mo + 1712 different design points. Note also that this problem was already 
observed by Atkinson and Donev [2] in the case m2 = 1 and tuq = 2 (see Ex- 
ample 20.2 in this reference). The results in the present paper show that this 
situation is not an exception but rather typical for discrimination designs 
constructed from the T-optimality criterion. 

If the system in (5.1) is not a Chebyshev system the results of Sections 
2 and 3 indicate that there exist several T-optimal designs. For example, 
consider the case of discriminating between a linear and a cubic polynomial, 
that is, m2 = 2, tuq = 2 on the interval [—1,1]. For the cubic model we 
investigate the model 

(5.2) r]{x) = 1 + X + cqx'^ + dox^ . 

Some T-optimal designs for various values of the parameters cq and do are 
given in Table 2. 

The T-optimal design obtained from the algorithm of Atkinson and Fe- 
dorov [3] for the parameters cq = and do = l has weights 1/6, 1/2 and 1/3 
at the points —1/2, 1/2 and 1. This design corresponds to the choice p= 1/6 
in Example 2.3 and will be called Ti/g-optimal design in this example. In 
order to compare the different designs with respect to their ability to discrim- 
inate between a cubic and a linear regression model by the common T-test 
we have modified the Ti/g-optimal design slightly and have put 2% of the ob- 
servations at a fourth point, namely the left boundary of the design space. A 
further T-optimal design with four support points is obtained from formula 
(2.14) withp = 1/3 and denoted as Ti/3-optimal design. Stigler [34] proposed 
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Table 2 

T -optimal designs for discriminating between a linear and a cubic polynomial given in 
(5.2) for a special choice of the parameters co and do (the parameter z satisfies 
2 £ R \ {0}^. In the case (co,do) = (0,«) the T -optimal design is not uniquely determined 



Co 


do 




X2 


X3 


1^1 




U>3 





z 


-0.5 


0.5 


1 


1/6 


1/2 


1/3 


z 





-1 





1 


1/4 


1/2 


1/4 


z 


z 


-1 


0.33 


1 


1/6 


1/2 


1/3 


z 


2 


-1 


-0.33 


1 


1/3 


1/2 


1/6 


2z 


Z 


-1 


0.2 


1 


1/5 


1/2 


3/10 


z 


2z 


-0.77 


0.411 


1 


1/6 


1/2 


1/3 


-2z 


z 


-1 


-0.2 


1 


3/10 


1/2 


1/5 


z 


-2z 


-1 


-0.411 


0.77 


1/3 


1/2 


1/6 



the L)2-criterion for the construction of a discriminating design between a Hn- 
ear and a cubic model. If Afi(,^) and M3(^) denote the information matrices 
of a design in the hnear and cubic model, respectively, the corresponding 
L'2-optimal design maximizes |M3(^)|/|Mi(^)| and has weights 1/5, 3/10, 
3/10 and 1/5 at the points -1, -0.408, 0.408 and 1 (see Studden [37]). 

We have conducted a small simulation study and generated normally dis- 
tributed random variables with mean given by (5.2) and variance cx^ =0.1, 
where the design was either the Ti/3-optimal, the (modified) T^/g-optimal 
or the -D2-optimal design. In Figure 2 we display the power function of the 
F-test for the hypothesis of a linear regression Hq : (co,(io) = (0)0) for var- 
ious choices of the parameters cq and do. The level is 5% and the sample 
size is n = 50. We have considered three values for the parameter cq and dis- 
play the power as a function of the parameter do. The solid line corresponds 
to the power function of the F-test based on the (modified) Ti/g-optimal 
design, while the dotted and dashed line refer to the Ti/3-optimal and D2- 
optimal design, respectively. If cq = the curves are almost identical if do is 




CO = Co = 0.05 CO = 0.1 



Fig. 2. Simulated rejection probabilities of the F-test Hq: {co,do) — (0,0) based on the 
D2-optimal design (dashed line), the modified Ti/^-optimal design (solid line) and the 
Til -I, -optimal design (dotted line) for the parameters {co,do) = (0, 1) in the cubic regression 
model (5.2). The errors are centered normally distributed with variance a'^ = 0.1. 
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0.2 0,4 0.6 o.e a^" 0.2 0.4 0.6 0.8 1^^ 0.2 6.4 0.6 o.s 1 



do = o <k = om do = 01 

Fig. 3. Simulated rejection probabilities of the F-test Ho'- {co,do) — (0,0) based on the 
D2-optimal design (dashed line), the modified Ti/Q-optimal design (solid line) and the 
Ti/-i,- optimal design (dotted line) for the parameters (co,do) = (0, 1) in the cubic regression 
model (5.2). The errors are centered normally distributed with variance = 0.1 



also small, while we observe some advantages for the Tij^- and £)2-optimal 
design for moderate and large values of do- Here the ri/3-optimal design has 
the best performance (see the left panel in Figure 2). The case of a positive 
parameter cq = 0.05, cq = 0.1 corresponds to an alternative. For small values 
of do, the D2-optimal and rx/3-optimal design seem to have better discrimi- 
nation properties than the T^/g-optimal design, while the opposite behavior 
is observed if cLq is large (see the middle and right panel in Figure 2). Next 
we consider the situation where do is fixed and the parameter cq is varied. If 
do = the D2-optimal design always yields more power than both T-optimal 
designs, where the Ti/3-optimal design shows some advantages compared to 
the Tj^/g-optimal design (see the left panel in Figure 3). For larger values of 
do the situation is similar. If do = 0.05 all three designs yield very similar 
results for small values of the parameter cq, while for larger values of cq the 
and L'2-optimal design yield more power than the T^/g-optimal design. 
Finally, in the case do = 0.1 the T^/g-optimal design should be preferred for 
small values of co if model discrimination is the main goal of the experi- 
menter. On the other hand, if do is large, the Z)2-optimal design has the 
best performance and both T-optimal designs show a similar behavior (see 
the right panel in Figure 3). Summarizing these observations, we conclude 
that the superiority of one of the two discrimination designs depends sensi- 
tively on the alternative under consideration. We finally also note that the 
Z)2-optimal design does not require any preliminary information regarding 
the (unknown) parameters and that the modified T^/g-optimal and the T1/3- 
optimal design were constructed for the particular alternative (co, do) = (0, 1) 
corresponding to the "true" model. Therefore we expect these designs to be 
particularly powerful in the examples considered in the simulation study. 

Usually the next step after model identification is the statistical analysis 
based on the identified model. Therefore it is also of interest to investigate 
the performance of the three discrimination designs for this purpose. In Table 
3 we present the mean squared errors of the least squares estimates d, 6, c 



DISCRIMINATION DESIGNS 



23 



Table 3 

Mean squared error of the least squares estimates in the cubic regression model. The data 
is obtained from the D2 -optimal, the Ti^ 3 -optimal and (modified) Ti/^-optimal design for 
the special choice of the parameters (co,do) = (0, 1). The variance is chosen as = 0.1 





L>2-optimaI design 


Modified Ti/e-optimal design 


Ti/ 3 -optimal design 


MSE(a) 


0.0050 


0.0103 


0.0060 


MSE(6) 


0.0290 


0.0324 


0.0220 


MSE(c) 


0.0120 


0.0545 


0.0160 


MSE(d) 


0.0360 


0.0766 


0.0320 



and d based on data obtained from a D2-optimal design, the (modified) T^/g- 
optimal and the ri/3-optimal design for the special choice of the parameters 
(co, do) = (0, 1). The model under consideration is in fact the cubic regression 
1 -\- X -\- x^, for which the T-optimal designs were constructed. We observe 
that the mean squared error of the estimates obtained from the (modified) 
T^/g-optimal design is substantially larger compared to the mean squared 
error obtained from the Z)2-optimal and T^/3-optimal design. For the last 
named designs the situation is very similar, where there are slight advantages 
for the Z)2-optimal design with respect to the estimation of the parameters 
a and c and the opposite behavior can be observed for the estimates of the 
parameters b and d. 

5.2. A nonlinear example. In this section we consider the problem of 
discrimination between the exponential regression models 

(5.3) T/i(x,6'(i)) = 61(1)1 exp(-6l(i)2x) + 6'(i)3exp(-6l(i)4x), 

(5.4) m{x,0{2)) = 6'(2)iexp(-6l(2)2x), 

where the explanatory variable varies in the interval A' = [— 1, 1]. These mod- 
els have numerous applications in pharmacokinetics (see, e.g., Shargel and 
Yu [31] or Rowland [29]) and optimal designs have been discussed exten- 
sively in the recent literature (see, Dette, Melas and Pepelysheff [15] or 
Biedermann, Dette and Pepelysheff [6]). It follows by similar arguments as 
given in Example 4.3 that a T-optimal design has at most three support 
points. The T-optimal designs are listed in Table 4 for various combinations 
of the parameters j = 1, . . . , 4. 

We have again performed a small simulation study in order to study the 
rejection probabilities of the likelihood ratio test of the hypothesis 

(5.5) Fo: ^(1)3 = 0, 

where the data is generated by the different designs. Because this test re- 
quires measurements at at least 4 locations, we have modified the T-optimal 
designs by putting 2% of the observations at a fourth point. 
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For a comparison, there are now two natural candidates based on the Dg- 
optimahty criterion. The first design is obtained by maximizing the power of 
the test for the hypothesis (5.5) and corresponds to the Di-criterion, while 
the second design is a D2-optimal design in the sense of Stigler [34], and 
corresponds to the test for the hypothesis 

(5.6) i/o: (^(1)3,^)4) = (0,0). 

The corresponding local optimal designs are presented in Table 5. 

We have simulated data according to the model rji and calculated the 
power of the likelihood ratio test for the hypothesis (5.5) in various situa- 
tions. The errors are normally distributed with variance cr^ = 0.05 (^(1)2 = 
-1,0(1)4 = -2) and = 0.2 (0(i)2 = -l,0(i)4 = 2;0(i)2 = 2,^(1)4 = 4), the 
sample size is n = 50 and 1000 simulation runs are used to calculate the 
rejection probabilities. Some typical results are depicted in Figure 4, which 
shows the probability of rejection as a function of the parameter 0(i)3- 



Table 4 

T-optimal designs for discriminating between the exponential regression models given in 
(5.3) and (5.4) for a special choice of the parameter d(i) 



0(1) = (0(1)1, 


10(1)2,0(1)3,0(1)4) 




X2 


X3 






U»3 


(1,2,1,4) 




-1 


-0.8 


-0.02 


0.088 


0.22 


0.692 


(1,-1,1,-2) 




-1 


0.6 


1 


0.645 


0.246 


0.109 


(1,-1,1,2) 




-1 


-0.272 


1 


0.168 


0.437 


0.395 


(-1,1,-1,2) 




-1 


-0.59 


1 


0.109 


0.252 


0.639 


(-1,-1,-1,- 


-0.5) 


-1 


0.35 


1 


0.394 


0.425 


0.181 



Table 5 

Ds-optimal designs, s = 1,2, for discriminating between the exponential regression models 
given in (5.3) and (5.4) .for a special choice of the parameter ^(1) 



(0(1)1, 0(1)2, 0(1)3, 0(1)4) 


s 


Xl 


X2 


X3 


X4, 




U!2 


UI3 


0^4 


(1,2,1,4) 


1 


-1 


-0.859 


-0.394 


0.717 


0.087 


0.197 


0.257 


0.459 




2 


-1 


-0.838 


-0.404 


0.52 


0.144 


0.258 


0.206 


0.392 


(1,-1,1,-2) 


1 


-1 


-0.03 


0.758 


1 


0.293 


0.346 


0.249 


0.112 




2 


-1 


0.03 


0.697 


1 


0.308 


0.253 


0.281 


0.158 


(1,-1,1,2) 


1 


-1 


-0.636 


0.394 


1 


0.142 


0.444 


0.311 


0.103 




2 


-1 


-0.616 


0.313 


1 


0.341 


0.309 


0.268 


0.082 


(-1,1,-1,2) 


1 


-1 


-0.758 


0.03 


1 


0.112 


0.249 


0.346 


0.293 




2 


-1 


-0.697 


-0.03 


1 


0.158 


0.281 


0.253 


0.308 


(-1,-1,-1,-0.5) 


1 


-1 


-0.273 


0.657 


1 


0.215 


0.631 


0.29 


0.134 




2 


-1 


-0.242 


0.576 


1 


0.324 


0.271 


0.275 


0.13 
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^(1)2 — ^(1)4 = -2 6(1)2 = ^(1)1 = 2 5{i)2 - 2, 9(1)4 = 4 



Fig. 4. Simulated rejection probabilities of the likelihood ratio test for the hypothe- 
sis (5.5) based on the Di-optimal design (dashed line), D2-optiraal design (dotted line) 
and the T -optimal design (solid line) in the exponential regression model (5.3), where 
= ^(1)3 = 1- 

If both parameters in the exponential functions are negative (left panel 
in Figure 4) the power of the test obtained from the modified T-optimal 
design is larger than the power of the test based on the Z)2-optimal design. 
On the other hand the Z)2-optimal design seems to have slightly better 
discrimination properties than the Di-optimal design in this case. If both 
parameters are of opposite sign (middle panel in Figure 4) the situation is 
different and the Z?2-optimal design yields a bit more power for small values 
of the parameter ^(1)3. In this example the Di-optimal design is totally 
defective. Finally, the right panel of Figure 4 shows a situation where both 
parameters in the exponential functions are positive. If both parameters are 
of opposite sign (middle panel in Figure 4) the situation is different and the 
Z)2-optimal design yields a bit more power for small values of the parameter 
^(1)3. In this example the -optimal design is totally defective. Finally, 
the right panel of Figure 4 shows a situation where both parameters in the 
exponential functions are positive. Here almost the same behavior as in the 
case of negative parameters is observed. While the D2-optimal design yields 
more power than the Di-optimal design, the test based on the (modified) 
T-optimal shows the best performance. On the other hand the Z)2-optimal 
design advices the experimenter to take observations at 4 different locations 
and therefore it also allows the estimation of all parameters in the extended 
model. 

The impact of the discriminating designs on the parameter estimates is 
investigated in Table 6, where we exemplarily show two typical examples 
of the simulated mean squared error of the parameter estimates under the 
different designs. If = (1, —1, 1, 2) the Di- and L>2-optimal designs yield 
substantially smaller mean squared errors than the T-optimal design, and 
the T^i-optimal design shows a slightly better performance than the D2- 
optimal design. In the case Oi^b) = (1)2,1,4) the Di- and T)2-optimal design 
yield the smallest mean squared errors, while the (modified) T-optimal shows 
again the worst performance. The mean squared errors obtained by the D2- 
optimal design are slightly larger than those obtained by the -optimal 
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Table 6 

Simulated mean squared error of the least squares estimates in the exponential regression 
model. The data is obtained from the Di-, D2- and (modified) T-optimal design for the 
special choice of the parameters 9(^a) = (1,-1,1,2) and 6(^b) ~ (1,2,1,4). The variance is 

chosen as — 0.2 







Ui-optimal design 


iI>2-optimal design 


T-optimal design 




MSE(a) 


0.04491 


0.05507 


0.31266 




MSE(6) 


0.05468 


0.06687 


1.07216 




MSE(c) 


0.02503 


0.03137 


0.15910 




MSE(d) 


0.02414 


0.02803 


0.15603 




MSE(a) 


0.18217 


0.18552 


0.37235 




MSE(6) 


0.57880 


0.80178 


2.94709 


d(B) 


MSE(c) 


0.18374 


0.17361 


0.37019 




MSE(d) 


0.25151 


0.21136 


0.43687 



design. Summarizing these and similar results (which are not shown for the 
sake of brevity) we conclude that the Di- and Z)2-optimal designs have good 
properties for model discrimination and additionally have good properties 
for parameter estimation if the null hypothesis (5.6) has been rejected. In 
many cases the mean squared error of the parameter estimates obtained 
from the modified T-optimal design is at least two times larger compared to 
the results obtained from the Di- and D2-OY>t\mal designs. 
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